Getting started with ggplot2

Leonard Blaschek

A quick word about myself

Traditional plotting interfaces

Traditional plotting interfaces

The ‘Grammar of Graphics’

A universal framework that allows creating and combining without limits.

Almost.

The ‘Grammar of Graphics’

The ‘Grammar of Graphics’

Coordinates Geometries Scales Theme

Data Mapping Statistics Facets

Scope of this workshop


  1. Load your data into R
  2. Make a handful of publishable plots
  3. Combine them into a multi-panel figure
  4. Save a .pdf in the right size (one/two column)

What we’re not doing today


  • Data wrangling (dplyr/tidyr)
  • Statistical tests
  • 90% of ggplot2

Multiplex CRISPR editing of wood for sustainable fiber production. Sulis DB, […], Barrangou R, Wang JP. 2023. Science 381:216–221. 10.1126/science.add4514


Three panels from main figure 3.

R fundamentals

ggplot()            # function
ggplot              # object
996107              # number
"ggplot"            # string
?ggplot()           # show help page 
library(tidyverse)  # use library() to load the tidyverse package

1. Data import

library(tidyverse)

1. Data import

library(tidyverse)
sulis_bar_data

1. Data import

library(tidyverse)
sulis_bar_data <- read_tsv()

1. Data import

?read_tsv()

Arguments without default need to be supplied

1. Data import

library(tidyverse)
sulis_bar_data <- read_tsv(file = "data/Sulis2023_fig3EF.tsv")

Assign the function output of read_tsv("data/Sulis2023_fig3EF.tsv")
to the object
sulis_bar_data.

1. Data import

library(tidyverse)
sulis_bar_data <- read_tsv("data/Sulis2023_fig3EF.tsv")
sulis_bar_data
# A tibble: 18 × 4
   line     replicate lignin    CL
   <chr>        <dbl>  <dbl> <dbl>
 1 H-4              1   15.5  4.03
 2 H-4              2   16.1  3.75
 3 I-18             1   17.9  2.89
 4 I-18             2   19.5  2.78
 5 I-18             3   22.0  2.92
 6 J-25             1   18.6  3.92
 7 J-25             2   20.7  3.27
 8 K-6              1   12.1  5.97
 9 K-6              2   11.4  6.61
10 K-9              1   23.0  2.95
11 K-9              2   22.6  2.97
12 K-9              3   20.9  3.18
13 K-13             1   22.2  3.04
14 K-13             2   23.0  2.83
15 K-13             3   23.5  2.65
16 Wildtype         1   22.1  2.85
17 Wildtype         2   23.7  2.63
18 Wildtype         3   23.5  2.80

2. Building a plot

Data

Coordinates

Mapping

Geometries

Statistics

Scales

Facets

Theme

bar_plot <- ggplot()

2. Building a plot

Data

Coordinates

Mapping

Geometries

Statistics

Scales

Facets

Theme

panel_E <- ggplot(
  data = sulis_bar_data
)

2. Building a plot

Data

Coordinates

Mapping

Geometries

Statistics

Scales

Facets

Theme

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = line,
    y = lignin
  )
)

2. Building a plot

Data

Coordinates

Mapping

Geometries

Statistics

Scales

Facets

Theme

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = line,
    y = lignin
  )
) +
  geom_bar(
    stat = "summary",
    fun.y = "mean"
  )

panel_E

Figure from the paper.

Our first ggplot.

Which differences can you spot?

Mapping — Reorder x-axis

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  geom_bar(
    stat = "summary",
    fun.y = "mean"
  )

panel_E

Geometries — Errorbars

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  geom_bar(
    stat = "summary",
    fun = "mean"
  ) +
  geom_errorbar(
    stat = "summary",
    fun.data = "mean_se"
  )

panel_E

Geometries — Errorbars

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  geom_errorbar(
    stat = "summary",
    fun.data = "mean_se",
    width = 0.2
  ) +
  geom_bar(
    stat = "summary",
    fun = "mean"
  )

panel_E

Layers are drawn in order, so before = below

Or perhaps we show the data?

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  geom_bar(
    stat = "summary",
    fun = "mean"
  ) +
  geom_jitter(width = 0.1)

panel_E

Coordinates — Cut y-axis

library(ggtext)
panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  geom_bar(
    stat = "summary",
    fun = "mean"
  ) +
  geom_jitter(width = 0.1) +
  coord_cartesian(
    ylim = c(10, 26),
    expand = FALSE
  )

panel_E

Let’s not do that.

Scales — Fill by severity

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  scale_fill_distiller(
    palette = "Greys"
  ) + # colorbrewer2.org palettes
  geom_bar(
    aes(fill = lignin),
    stat = "summary",
    fun = "mean",
    colour = "black"
  ) +
  geom_jitter(width = 0.1)

panel_E

That didn’t work because the bar height is calculated within the function

Scales — Fill by severity

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  scale_fill_distiller(
    palette = "Greys"
  ) +
  geom_bar(
    aes(fill = after_stat(y)),
    stat = "summary",
    fun = "mean",
    colour = "black"
  ) +
  geom_jitter(width = 0.1)

panel_E

after_stat(y) tells the function to use the y variable after calculation of the stats (in this case the mean)

Theme — Fix axis titles

library(ggtext)
panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  scale_fill_distiller(
    palette = "Greys"
  ) +
  geom_bar(
    aes(fill = ..y..),
    stat = "summary",
    fun = "mean",
    colour = "black"
  ) +
  geom_jitter(width = 0.1) +
  labs(
    x = NULL,
    y = "<b>Lignin content</b> (% wt)"
  ) +
  theme(
    axis.title.y = element_markdown()
  )

panel_E

ggtext is a ggplot2 extension that implements HTML and markdown syntax within strings

Theme — Rotate axis labels

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  scale_fill_distiller(
    palette = "Greys"
  ) +
  geom_bar(
    aes(fill = ..y..),
    stat = "summary",
    fun = "mean",
    colour = "black"
  ) +
  geom_jitter(width = 0.1) +
  labs(
    x = NULL,
    y = "<b>Lignin content</b> (% wt)"
  ) +
  theme(
    axis.title.y = element_markdown(),
    axis.text.x = element_text(
      angle = 90,
      vjust = 0.5,
      hjust = 1
    )
  )

panel_E

Theme — Remove legend

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  scale_fill_distiller(
    palette = "Greys"
  ) +
  geom_bar(
    aes(fill = ..y..),
    stat = "summary",
    fun = "mean",
    colour = "black"
  ) +
  geom_jitter(width = 0.1) +
  labs(
    x = NULL,
    y = "<b>Lignin content</b> (% wt)"
  ) +
  theme(
    axis.title.y = element_markdown(),
    axis.text.x = element_text(
      angle = 90,
      vjust = 0.5,
      hjust = 1
    ),
    legend.position = "none"
  )

panel_E

Theme — The last details

panel_E <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = lignin
  )
) +
  scale_fill_distiller(
    palette = "Greys"
  ) +
  geom_bar(
    aes(fill = ..y..),
    stat = "summary",
    fun = "mean",
    colour = "black",
    width = 0.8
  ) +
  geom_jitter(
    width = 0.1,
    shape = 21,
    fill = "black",
    colour = "white"
  ) +
  labs(
    x = NULL,
    y = "<b>Lignin content</b> (% wt)"
  ) +
  scale_y_continuous(
    expand = expansion(
      mult = c(0, 0.05)
    )
  ) +
  theme_sulis() +
  theme(
    axis.title.y = element_markdown(),
    axis.text.x = element_text(
      angle = 90,
      vjust = 0.5,
      hjust = 1
    ),
    legend.position = "none"
  )

panel_E

Applying a theme_*() can completely change the look — see how to make your own in the exercise file

Figure from the paper.

Our first ggplot. Text for ants? We’ll fix that later.

Next panel — same code with a few adjustments

panel_F <- ggplot(
  data = sulis_bar_data,
  aes(
    x = fct_reorder(
      line,
      lignin,
      .desc = TRUE
    ),
    y = CL
  )
) +
  scale_fill_distiller(
    palette = "Greys",
    direction = 1
  ) +
  geom_bar(
    aes(fill = ..y..),
    stat = "summary",
    fun = "mean",
    colour = "black",
    width = 0.8
  ) +
  geom_jitter(
    width = 0.1,
    shape = 21,
    fill = "black",
    colour = "white"
  ) +
  labs(
    x = NULL,
    y = "<b>C/L ratio</b>"
  ) +
  scale_y_continuous(
    expand = expansion(
      mult = c(0, 0.05)
    )
  ) +
  theme_sulis() +
  theme(
    axis.title.y = element_markdown(),
    axis.text.x = element_text(
      angle = 90,
      vjust = 0.5,
      hjust = 1
    ),
    legend.position = "none"
  )

panel_F

Multiplex CRISPR editing of wood for sustainable fiber production. Sulis DB, […], Barrangou R, Wang JP. 2023. Science 381:216–221. 10.1126/science.add4514


Three panels from main figure 3.

Load data

sulis_scatter_data <- read_tsv("data/Sulis2023_fig3G.tsv")

The basic scatterplot

panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_point() +
  theme_sulis()

panel_K

Colour the points

panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_point(
    aes(colour = type)
  ) +
  scale_colour_manual(
    values = c("blue", "red")
  ) +
  theme_sulis()

panel_K

Colour the points — hex codes

panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_point(
    aes(colour = type)
  ) +
  scale_colour_manual(
    values = c("#275d95", "#d25952")
  ) +
  theme_sulis()

panel_K

Colour and fill the points

panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_point(
    aes(
      colour = type,
      fill = after_scale(
        alpha(colour, 0.4)
        )
    ),
    shape = "circle filled"
  ) +
  scale_colour_manual(
    values = c("#275d95", "#d25952")
  ) +
  theme_sulis()

panel_K

Reposition the legend

panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_point(
    aes(
      colour = type,
      fill = after_scale(
        alpha(colour, 0.4)
      )
    ),
    shape = "circle filled"
  ) +
  scale_colour_manual(
    values = c("#275d95", "#d25952")
  ) +
  theme_sulis() +
  theme(
    legend.position = c(0.25, 0.95),
    legend.title = element_blank()
  )

panel_K

Add horizontal line

panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_hline(
    yintercept = 100,
    colour = "grey",
    linetype = "dashed"
  ) +
  geom_point(
    aes(
      colour = type,
      fill = after_scale(alpha(colour, 0.4))
    ),
    shape = "circle filled"
  ) +
  scale_colour_manual(
    values = c("#275d95", "#d25952")
  ) +
  theme_sulis() +
  theme(
    legend.position = c(0.25, 0.95),
    legend.title = element_blank()
  )

panel_K

The last details

Figure from the paper.

Our third ggplot.

Bonus: label interesting points

# A tibble: 13 × 7
   line  replicate type                label rel_lignin rel_CL rel_volume
   <chr>     <dbl> <chr>               <chr>      <dbl>  <dbl>      <dbl>
 1 WT            1 Wildtype            <NA>       103.    97.2       89.6
 2 WT            2 Wildtype            <NA>        99.6  100.        98.6
 3 WT            3 Wildtype            <NA>        97.9  102.       113. 
 4 H-4           1 CRISPR-edited lines H-4-1       67.0  146.       106. 
 5 H-4           2 CRISPR-edited lines H-4-2       69.9  136.        66.2
 6 H-18          1 CRISPR-edited lines <NA>        70.8  150.        72.6
 7 H-19          1 CRISPR-edited lines <NA>        89.3  105.        85.6
 8 H-19          2 CRISPR-edited lines <NA>        87.6  101.       103. 
 9 H-19          3 CRISPR-edited lines <NA>        79.3  117.        63.2
10 H-19          4 CRISPR-edited lines <NA>        98.8   99.1       52.1
11 H-20          1 CRISPR-edited lines <NA>        95.5  111.        95.0
12 H-20          2 CRISPR-edited lines <NA>        79.8  128.        92.5
13 H-20          3 CRISPR-edited lines <NA>        85.3  130.        75.9

Bonus: label interesting points

library(ggrepel)
panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_hline(
    yintercept = 100,
    colour = "grey",
    linetype = "dashed"
  ) +
  geom_point(
    aes(
      colour = type,
      fill = after_scale(alpha(colour, 0.4))
    ),
    shape = "circle filled"
  ) +
  geom_label_repel(
    aes(label = label),
  ) +
  scale_colour_manual(
    values = c("#275d95", "#d25952")
  ) +
  labs(
    x = "**Lignin content** (% of wildtype)",
    y = "<b>Stem volume</b> (% of wildtype)"
  ) +
  theme_sulis() +
  theme(
    axis.title.x = element_markdown(),
    axis.title.y = element_markdown(),
    legend.position = c(0.25, 0.95),
    legend.title = element_blank()
  )

panel_K

ggrepel creates labels that automatically avoid overlapping

Bonus: label interesting points

library(ggrepel)
panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_hline(
    yintercept = 100,
    colour = "grey",
    linetype = "dashed"
  ) +
  geom_point(
    aes(
      colour = type,
      fill = after_scale(alpha(colour, 0.4))
    ),
    shape = "circle filled"
  ) +
  geom_label_repel(
    aes(label = label),
    size = 8
  ) +
  scale_colour_manual(
    values = c("#275d95", "#d25952")
  ) +
  labs(
    x = "**Lignin content** (% of wildtype)",
    y = "<b>Stem volume</b> (% of wildtype)"
  ) +
  theme_sulis() +
  theme(
    axis.title.x = element_markdown(),
    axis.title.y = element_markdown(),
    legend.position = c(0.25, 0.95),
    legend.title = element_blank()
  )

panel_K

Geom_* size is defined in mm, theme elements in pt

Bonus: label interesting points

ggtext_size <- 8 / (14 / 5)
library(ggrepel)
panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_hline(
    yintercept = 100,
    colour = "grey",
    linetype = "dashed"
  ) +
  geom_point(
    aes(
      colour = type,
      fill = after_scale(alpha(colour, 0.4))
    ),
    shape = "circle filled"
  ) +
  geom_label_repel(
    aes(label = label),
    size = ggtext_size
  ) +
  scale_colour_manual(
    values = c("#275d95", "#d25952")
  ) +
  labs(
    x = "**Lignin content** (% of wildtype)",
    y = "<b>Stem volume</b> (% of wildtype)"
  ) +
  theme_sulis() +
  theme(
    axis.title.x = element_markdown(),
    axis.title.y = element_markdown(),
    legend.position = c(0.25, 0.95),
    legend.title = element_blank()
  )

panel_K

14 to 5 is the ratio of pt to mm

Bonus: label interesting points

ggtext_size <- 8 / (14 / 5)
library(ggrepel)
panel_K <- ggplot(
  data = sulis_scatter_data,
  aes(
    x = rel_lignin,
    y = rel_volume
  )
) +
  geom_hline(
    yintercept = 100,
    colour = "grey",
    linetype = "dashed"
  ) +
  geom_point(
    aes(
      colour = type,
      fill = after_scale(alpha(colour, 0.4))
    ),
    shape = "circle filled"
  ) +
  geom_label_repel(
    aes(label = label),
    size = ggtext_size,
    label.size = NA,
    fill = rgb(1, 1, 1, 0.5),
    min.segment.length = 0
  ) +
  scale_colour_manual(
    values = c("#275d95", "#d25952")
  ) +
  labs(
    x = "**Lignin content** (% of wildtype)",
    y = "<b>Stem volume</b> (% of wildtype)"
  ) +
  theme_sulis() +
  theme(
    axis.title.x = element_markdown(),
    axis.title.y = element_markdown(),
    legend.position = c(0.25, 0.95),
    legend.title = element_blank()
  )

panel_K

3. Assembling a figure

library(patchwork)
panel_E + panel_F + panel_K

patchwork automatically aligns plots into multi-panel figures

Adjust relative widths

library(patchwork)
panel_E + panel_F + panel_K +
  plot_layout(widths = c(1, 1, 1.75))

Add panel labels

library(patchwork)
panel_E + panel_F + panel_K +
  plot_layout(widths = c(1, 1, 1.75)) &
  plot_annotation(tag_levels = "A")

Format panel labels

library(patchwork)
panel_E + panel_F + panel_K +
  plot_layout(widths = c(1, 1, 1.75)) &
  plot_annotation(tag_levels = list(c("E", "F", "K"))) &
  theme(plot.tag = element_text(size = 10, face = "bold"))

4. Saving plots

Save to pdf

ggsave(
  "images/fig3.pdf",
  width = 180,
  height = 60,
  units = "mm"
)

PDFs don’t lose resolution and are easily edited in Inkscape/Photoshop

Save to png

library(ragg)
ggsave(
  "images/fig3.png",
  width = 180,
  height = 60,
  units = "mm",
  device = agg_png
)

ragg and its devices (agg_tiff, agg_jpeg, agg_png) improve raster graphics text rendering

Exercises!

Open up 2023_ggplot2_exercises.rmd and give it a try

Some pointers

Resources to go further